Some Technical Aspects about Aligning Near Languages
نویسندگان
چکیده
IULA at UPF has developed an aligner that benefits from corpus processing results to produce an accurate and robust alignment, even with noisy parallel corpora. It compares lemmata and part-of-speech tags of analysed texts but it has two main characteristics. First, apparently it only works for near languages and second it requires morphological taggers for the compared languages. These two characteristics prevent this technique from being used for any pair of languages. Whevener it its applicable, a high quality of results is achieved.
منابع مشابه
Annotating Predicate-Argument Structure for a Parallel Treebank
Abstract We report on a recently initiated project which aims at building a multi-layered parallel treebank of English and German. Particular attention is devoted to a dedicated predicate-argument layer which is used for aligning translationally equivalent sentences of the two languages. We describe both our conceptual decisions and aspects of their technical realisation. We discuss some select...
متن کاملSome Aspects about Seismology of 2012 August 11 Ahar-Vaezaghan (Azarbayjan, NW of Persia) Earthquakes Sequences
In 2012 August 11 (12:23 UTC) a moderate earthquake with MW=6.4 (USGS) occurred between Ahar and Varzaghan towns in Azarbayjan Province at northwest of Iran. After eleven minutes another earthquake shook the area with MW=6.2 (USGS). These consecutive earthquakes followed by intensive sequences of aftershocks whereas the strongest one had MW=5.3 (USGS). In data processing including depth modific...
متن کاملQuestions related to Bitcoin and other Informational Money
A collection of questions about Bitcoin and its hypothetical relatives Bitguilder and Bitpenny is formulated. These questions concern technical issues about protocols, security issues, issues about the formalizations of informational monies in various contexts, and issues about forms of use and misuse. Some questions are formulated in the more general setting of informational monies and near-mo...
متن کاملDigital Talking Books in Multiple Languages and Varieties
This paper describes our work in digital talking book alignment, starting by our earlier efforts for the alignment of books in European Portuguese, and ending with the two challenges we are currently facing of aligning books in different varieties of Portuguese and aligning parallel books in different languages. Our alignment module proved robust enough for porting to other varieties of Portugu...
متن کاملDeterministic Fuzzy Automaton on Subclasses of Fuzzy Regular ω-Languages
In formal language theory, we are mainly interested in the natural language computational aspects of ω-languages. Therefore in this respect it is convenient to consider fuzzy ω-languages. In this paper, we introduce two subclasses of fuzzy regular ω-languages called fuzzy n-local ω-languages and Buchi fuzzy n-local ω-languages, and give some closure properties for those subclasses. We define a ...
متن کامل